Search Result

Select

Hierarchical ( α_ij, k, m)-anonymity privacy preservation based on multiple sensitive attributes

WANG Qiuyue, GE Lina, GENG Bo, WANG Lijuan

Journal of Computer Applications 2018, 38 (1): 67-72. DOI: 10.11772/j.issn.1001-9081.2017071863

Abstract （486）

PDF （1111KB）（293）

Save

To resist existing limitations and associated attack by anonymization of single sensitive attributes, an ( α _ij, k,m)-anonymity model based on greedy algorithm was proposed. Firstly, the ( α _ij, k,m)-anonymity model was mainly to protect multi-sensitive attribute information. Secondly, the model for level was carried out according to the sensitive values of the sensitive attributes, if there were m sensitive attributes, there were m tables. Thirdly, each level was assigned a specific account α _ij by the model. Finally, the ( α _ij, k,m)-anonymity algorithm based on greedy strategy was designed, and a local optimum method was adopted to implement the ideas of the model which improves the degree of data privacy protection. The proposed model was compared with other three models from information loss, execution times, and the sensitivity distance of equivalent class. The experimental results show that, although the execution time of the proposed model is slightly longer than other compared models, however, the information loss is less and the privacy protection degree of data is higher. It can resist the associated attack and protect the data of multi-sensitive attributes.

Reference | Related Articles | Metrics

Select

Entity relationship search over extended knowledge graph

WANG Qiuyue, QIN Xiongpai, CAO Wei, QIN Biao

Journal of Computer Applications 2016, 36 (4): 985-991. DOI: 10.11772/j.issn.1001-9081.2016.04.0985

Abstract （925）

PDF （1139KB）（673）

Save

It is difficult for entity search and question answering over text corpora to join cues from multiple documents to process relationship-centric search tasks, although structured querying over knowledge base can resolve such problem, but it still suffers from poor recall because of the heterogeneity and incompleteness of knowledge base. To address these problems, the knowledge graph was extended with information from textual corpora and a corresponding triple pattern with textual phrases was designed for uniform query of knowledge graph and textual corpora. Accordingly, a model for automatic query relaxation and scoring query answers (tuples of entities) was proposed, and an efficient top- k query processing strategy was put forward. Comparison experiments were conducted with two classical methods on three different benchmarks including entity search, entity-relationship search and complex entity-relationship queries using a combination of the Yago knowledge graph and the entity-annotated ClueWeb '09 corpus. The experimental results show that the entity-relationship search system with query relaxation over extended knowledge base outperforms the comparison systems with a big margin, the Mean Average Precision (MAP) are improved by more than 27%, 37%, 64% respectively on the three benchmarks.

Reference | Related Articles | Metrics

Select

Deep Web resource selection using topic model

WANG Qiuyue, CAO Wei, SHI Shaochen

Journal of Computer Applications 2015, 35 (9): 2553-2559. DOI: 10.11772/j.issn.1001-9081.2015.09.2553

Abstract （345）

PDF （1304KB）（296）

Save

Federated search is a widely-used technique to find information on Deep Web. Given a user query, one of the challenges for a federated search system is to select a set of resources that are most likely to return relevant results for the query. Most existing resource selection methods are based on text-matching between the sample documents of the resource and the query, which typically suffer the problem of missing vocabulary or incomplete information. To alleviate the problem of incomplete information, Latent Dirichlet Allocation (LDA) topic model approach for resource selection was proposed. First, topic probability distributions for resources and query were inferred using LDA topic model approach. Then the similarities between the topic distributions of resources and query were calculated to rank the resources. By mapping both resources and the query into the low dimensional topic space, the problem of missing information caused by the sparsity of high dimensional word space was alleviated. Experiments were conducted on the test sets of TREC FedWeb 2013 and 2014 Tracks, and the results were compared with that of other participants in the Tracks. The experimental results on the TREC FedWeb 2013 Track show that the LDA based approach outperforms the best result of other participants by 24%; and the results on the TREC FedWeb 2014 Track show that it outperforms the best results of the traditional text-matching-based resource selection methods using either small-or big-document strategies by 22% for small-document methods and 43% for big-document methods respectively. In addition, using sampled snippets rather than documents to generate big-document representation for resources can significantly improve the efficiency of the system, thus enables the proposed approach more feasible and applicable in practice.

Reference | Related Articles | Metrics

Select

Big data benchmarks: state-of-art and trends

ZHOU Xiaoyun, QIN Xiongpai, WANG Qiuyue

Journal of Computer Applications 2015, 35 (4): 1137-1142. DOI: 10.11772/j.issn.1001-9081.2015.04.1137

Abstract （459）

PDF （1039KB）（639）

Save

A big data benchmark is needed eagerly by customers, industry and academia, to evaluate big data systems, improve current techniques and develop new techniques. A number of prominent works in last several years were reviewed. Their characteristics were introduced and the shortcomings were analyzed. Based on that, some suggestions on building a new big data benchmark are provided, including: 1) component based benchmarks as well as end-to-end benchmarks should be used in combination to test different tools inside the system and test the system as a whole, while component benchmarks are ingredients of the whole big data benchmark suite; 2) workloads should be enriched with complex analytics to encompass different application requirements, besides SQL queries; 3) other than performance metrics (response time and throughput), some other metrics should also be considered, including scalability, fault tolerance, energy saving and security.

Reference | Related Articles | Metrics